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Abstract 

Without assuming any pdf for some measured parameter, we derive a predictive pdf for 
the outcome of a second measurement, given the outcome of the first measurement and two 
common assumptions about the noise. These are that (1) it is additive, and (2) it is of some 
known pdf. The argument is based on a Bayesian analysis of the noise when no pdf is 
provided for the value of the parameter. In this way we avoid assuming an ad-hoc prior. 
We clarify how this method of direct predictive inference is distinct from fiducial prediction. 
We specify the distinct flaw in the fiducial argument, and outline the importance of this 
development in the foundations of probability and statistics. 

Keywords: nonparametric predictive inference, direct pivotal argument, pivotal argu- 
ment, fiducial argument, fiducial prediction, Bayesian inference, reference prior, reference 
class. 



H ■ 1 Outline and Motivation 

We examine the problem of deriving a pdf predicting a future outcome x^ based on an earlier outcome 
-t— > ' i 

x\ and a model of location measurements, when there is no prior pdf. We consider the solution based on 

the distribution of x<i~ X\, which is readily determined. In this way we avoid assuming some "reference 
prior" (e.g. QJ). We also dodge the problematic premises of fiducial prediction ([2|, [3|; reviewed in [4|); 
that is, we can do without parametric inference, unlike in the Bayesian and fiducial treatments. 

The intuitive appeal of the "direct" pivotal argument (the term is borrowed from |4|, p. 365) is bal- 
anced by an understandable skepticism toward pivotal inference because an argument of this kind has 
(S) ■ also been advanced by Fisher in support of fiducial inference (e.g. [3 1). This generated much controversy 
in its day and is generally regarded as problematic at best (1 5 1, [4 1, [61, 1 71 — even by Fisher himself, in 
private communication with Barnard and Savage, cited in |6|, p. 381). 

In this setting we have a double task. In the first place we have to account for selectively dismissing 
fiducial inference but not the pivotal argument of direct predictive inference. We focus on a technical 
shortcoming specific to the fiducial argument, in Sec. 12.2.31 This observation is interesting in itself, 
because it complements earlier critical reviews of the fiducial argument, like [5 1, |4|, and |6|. On the 
other hand, to address certain well-founded concerns, we submit a mathematical demonstration (starting 
at Sec. l2.2.4l that the pivotal pdf in the direct pivotal argument is unaffected by our knowing x\ . 

Doing so, we justify solutions to common practical problems, without resorting to "statistical prin- 
ciples" and makeshift priors. This treatment also introduces an important theoretical development, be- 
cause until now the fiducial argument has posed a profound quandary: it has already been shown that 
it leads to contradiction (see |4|, Sec. 5) but, inasmuch as no specific step in the fiducial argument has 
been identified as being wrong, the foundation of probability theory (not only statistics) is implicated, by 
default. In the aftermath, a tacit stopgap injuction has been generally in effect, to avoid fiducial infer- 
ence, as it were with the force of an ad-hoc modification of the axioms of probability. In consequence, 
the process of deduction has been undermined, because new axioms have not been explicitly stated, and, 
even if they had, any modified foundation would seem contrived in comparison to the classical one. (This 
is most evident in the case of the "direct" pivotal argument, which is similar to fiducial inference yet is 
distinct from it, so that it is unclear whether the informal injunction applies.) The present treatment 
dissolves this predicament, by isolating the distinct flaw in the fiducial argument, and so restoring the 
classical foundation in the theory of probability. 

* With a typo correction and minor improvements over earlier drafts (privately communicated). 
§ anakreonOhol . gr 

^ 2215 Coover Hall, Iowa State University, Ames, Iowa 50011; berleantOiastate . edu 
1 'Location' means that the noise is additive and its pdf does not depend on the measured magnitude. 
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2 The main argument 



2.1 Fundamental issues related to predictive inference 

To use a plain example, let 9 be an unknown real constant, not the result of any known random pro- 
cess. Two independent measurements of 9, with outcomes x\ and x<i, correspond to conditional random 
variables an d X^ {#=£}• ^ e assume that, conditionally on any possible value of 9, these are 

independent and distributed normally: {# = A ~ N(£, 0"-^ ) and ~ N(£, (T^ r), where <J\ and 

<T2 are assumed constant and known. The difference of the two outcomes, d = X2~x\, corresponds to 

9 9 

X2 — X^ |^_^|, which is distributed normally with mean and variance +^"2 , without refer- 



ence to the value of 9. Therefore random variable D can be denned, following N(0, and is not 

conditional on 9. Any random variable with this property is called pivotal or a pivot. 

The pdf of D can be directly applied to generate the pdf for "£2, given x\" according to what has 
been called "direct" pivotal argument (as distinct from fiducial prediction) in [4| (p. 365). But this is not 
the end of the issue, because this argument relies on the premise that the pdf of D^ x s y is the same 

as the pdf of D. To defend this claim, it is not enough to know that the distribution of D is independent 
of 9; one must also demonstrate that knowledge of x\ does not make any difference (that is, x\ by itself 
does not specify any recognizable subset of the reference class (or "reference set") associated with D (e.g. 
CD pp. 57-58, GD, or 0)). 

The problem has persisted for decades, because, lacking a mathematical demonstration of this claim 
([51 Sec. 4; |6| Sec. 7.2), conjectures about the post-data pdf of a pivot have been presumed by some 
authors on the basis of an intuitive conviction, starting with Fisher (e.g., see |6| Sec. 7.2), who has evoked 
a general version of this assertion in support of his fiducial argument. The inadequacy of a bare appeal 
to intuition has been outstanding since Fisher's assertion was checked wrong in a particular situation 1 8 1 
which involves the t-distribution (for an outline see [4| p. 364). 

Nevertheless this finding does not extend to our example. Moreover, in this work we submit a proof 
backing the claim that, in the case of location measurements, x\ is irrelevant (by itself) to the pdf of D. 

2.2 General considerations 

2.2.1 Additive noise as a pivotal random variable 

Each location measurement (here we label them by i — 1, 2, ...) can be thought of as involving a ran- 
dom process, generating noise of a known pdf fi(e^), which is then added to the unknown parameter 
9 to provide an outcome Xj . Our assignment of a pre-data pdf to E^ is based on this assumption, which 
we call Assumption Bj , or simply B^ . 

We shall employ a detailed notation for probability statements, to display the defining assumptions 
(the "context" or "reference class") which label the corresponding probability space. For instance, to state 
explicitly that the pre-data pdf of E\ does not depend on (hypothetical values of) 9, we write 

(Vt) Pr^^il^,"^") = Pr^eilBi) = frfa) . (1) 
(That is, E\ is a pivot.) 

The suffix in a probability statement has a double role. Not only it denotes the random variable 
(here: E\) which is associated with the indicated value (here: e\), but also, when the random variable 
is continuous, it specifies the parametrisation used to represent the probability density function as a 
regular function. 

In the following we shall focus on determining the post-data pdf of the error of measurement. This 
pdf can be expressed as Pr_g; (e\ \B\, "x\=s"). 

2.2.2 Disregarding the true value of the parameter 

In the expression Yy fi^(e\\B\,"x\=s") , the reference class is specified by two conditions: an as- 
sumption regarding the noise of this type of measurement, and an acceptance of the outcome of this 
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particular measurement. 

As a technical excercise, if in the specification of the reference class we also included the hypotheti- 
cal "true value of 9", as in Prg^ (eil^l > "xi=s", "9=t"), the pdf would collapse into the delta-function 

5{e\— s+t), which is unspecified, therefore useless for predictive inference (besides, it does not contain 
any trace of the known properties of the measurement). 

However, the general issue of selecting the reference class is still open; at any rate, it cannot be 
decided within the theory of probability. We cannot prove that it is wrong to include "9=t" in the spec- 
ification of the reference class. We rather point out some counterintuitive consequences of this choice, 
ultimately related to practical disadvantages. 

A simple version of the same problem will arise if one tosses a fair coin, and immediately covers it 
with a bowl. Do we accept that the reference class consists of all such trials (disregarding that either 
'heads' or 'tails' has already become a constituent of reality) so that probability of heads-under-the-bowl 
is 0.5, or do we restrict the refererence class to this single case, so that probability cannot be defined 
(except trivially)? With the first option, we stand to gain (in the long run) from betting against someone 
who wrongly believes that probability of heads-under-the-bowl is 0.6. Not so if we follow the second 
option. 

The same problem arises in any application of Bayes' theorem, even if it is based on well-defined prior 
probability. Take for instance the interpretation of a medical diagnostic test, such as an HIV-antibody 
test. The lab procedure outputs the relative likelihood of infection, which then can be combined with 
prior probability based on information about the subject's lifestyle, using statistical tables, to derive the 
posterior probability of HIV infection. This analysis assumes that the reference class is the set of people 
with the same lifestyle. On the other hand, if one refuses to relax the consideration that the particular 
subject is either already infected or not infected, the probability of HIV infection cannot be defined (except 
trivially, that is, "either or 1"). 

In view of these consequences of our options, we elect to include (or imply) the clause "disregarding 
the true value of 9" in the interpretation of a pivotal random variable, after the outcome is known, so that 
we prevent the collapse into a trivial reference class. A deliberate omission of this clause would amount 
to voiding the pdf of E-y _ s \ . 

2.2.3 The impossibility of a fiducial argument 

Fisher has emphasised repeatedly that the fiducial assignment of a pdf to the parameter requires a 
carefully considered specification of the reference class. (He uses the terms 'aggregate', 'population', and 
'reference set', as cited in 1 6 ].) However, when we follow this advice, we find that the fiducial argument is 
not sustainable. 

Suppose (for the moment) that we have established the assignment of a pdf to Ey f x ~ =s y, we are 

not thereby justified to pair it with a corresponding pdf for 9, because it would conflict with the clause 
"disregarding the true value of 9" which is implied in the specification of the reference class. A striking 
incongruity looms in the sentence "the probability of 9 being between ty and t^, disregarding the value 
of0.isO.95". 

Here we divorce direct predictive inference from the fiducial argument, in terms of logical connection, 
yet it can be said that the two schemes are related in intention. In the words of A. P. Dempster 1 9 1, 
"fiducial probabilities are intended for post-data predictive interpretation". Also F. Hampel 1 10 1 focuses 
on the predictive role of fiducial probabilities. We outline Fisher's position in Sec. 

2.2.4 Determination of post-data probability for additive noise 

In relation to the issue "what is the pdf of Ey one may also ask "how is this pdf updated 

if, on the next day, we learn that 9 had been a random outcome of some process, with pdf 7r(#)". The 
idea is that Bayesian updating applies in this case, as if the object of the measurement were the value of 

the noise, e\, and the direct information about 9 were only part of the measurement process. The pdf 

2 In the words of Jaynes: "But a telescope maker might see it differently. For him, the errors it 
produces are the objects of interest to study, and a star is only a convenient fixed object on which to 
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of £q | x ^_ s | will be identified with the prior of the alternate Bayesian treatment (since it is meant to 

apply when we lack any direct information about 9). 

To present this argument clearly, let us denote by TC the assumption that 9 is the result of a random 
process, corresponding to a random variable 0, of pdf n{9). If TC is accepted, then use of Bayes' theorem 
is justified for probability update, assuming an outcome x\. 

In our case we also have the assumption of a location measurement. It is expressed in Eq.[J which 
states the requirement that the pre-measurement pdf of the noise (E\ ) be independent of [any hypothet- 
ical value that might be supposed of] 9. Note that the converse property is also true, trivially: 

(Ve) Pr e (0|W,"ei=e") = ^q{0\TC) = ix{9) . (2) 

That is, the prior pdf associated with 9 is independent of [any hypothetical magnitude that might be 
supposed of] the error of this measurement. 

Note the formal symmetry between (9, TC) and [e\, B\), by comparing 

Eq.[UwithEq.El Therefore 

there are two Bayesian ways of deriving Pr^ (e|<^l, TC, "xi=s"). As a consistency check, let us compare 

the results of the two corresponding treatments. 

A. The usual Bayesian treatment 

In the usual treatment we first update the pdf of 0, from ir(9) to the corresponding posterior pdf. 
We apply the familiar Bayesian formula 

posterior pdf = prior pdf X likelihood X normalising constant. 

The likelihood function for 9, given "x\ = s", is defined up to an unimportant factor: 

L e{xi=s} (t) oc Pr Xl ( S |B b «^r). (3) 

Considering the transformation from to E-^ = ^\ {9=t} ~ ^' °^ J acoman determinant 1, 

we obtain 

PT Xl (s\B h «9=t») = Pr^s-ilB!,"^). (4) 
From Eq.s|3l|U and[U we obtain 

L fl{a; 1 =a}( t ) K • (5) 

Note that the definition of the likelihood function does not involve TC. (That is, with a different prior pdf 
for 9, or in default of any prior pdf, the likelihood function would be the same.) 

The posterior pdf of is^ 

Pr e (t|ft,gi,"si=s") oc L e{xi=s} (t)n(t) cc f^s-t) 7r(t) . 

focus his instrument for the purpose of determining those errors. Thus a given data set might serve 
two entirely different purposes; one man's 'noise' is another man's 'signal'" 1 1 1 1 Ch. 7. (Also in Ch. 8 he 
observes the mathematical "reciprocity" between random variable and any ancillary random variable.) 

3 This symmetry may be clouded because of certain properties that typically are desired for f\(e\) 
without being required of it: we prefer that it average to zero, and that it also be symmetrical about 
zero; moreover, it is convenient that it follow the normal distribution. However, these properties being 
nonessential, there is no real issue here. Besides, when we want to perform a direct zero calibration of 
the apparatus, we usually select a ir(9) having the above properties. 

4 Strictly speaking, if /]_(•) is smooth, Bayesian updating cannot be based on the acceptance of "x\ = 
s" because the probability of that occurence is zero for all values of 9. Instead of an exact value for x\, 
we consider some small interval including that value. In the first-order approximation we reckon the 
(conditional on 9) probability of this interval as the product P^X\ ( s \Bl,"9=t") x Ax\. Therefore if the 

interval is small enough we apply the likelihood function defined in Eq.|3]as the first-order approximation. 
In fact it is never possible to record x\ exactly; it is always registered as a digitised entry, which is 
equivalent to some interval. 
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Now we take advantage of the transformation from to E\ which is defined by E\ = s— 0, of Jacobian 
determinant 1, to derive 

Pr E (e\Bi,H,"xi=s") = PT @ (s-e\B h H, "xi=s w ) oc /i(e) ir(s-e) . (6) 



B. The "instrument maker's" Bayesian treatment 

In the alternate Bayesian treatment we update the pdf of E\, from f\{e\) to the corresponding 
posterior pdf. Again we make use of a likelihood function but now it is the likelihood function for e\, 
given "x\ = s". In analogy with the previous treatment, we define the likelihood function for e\, given 
"xi = s", without reference to By. 

L ei {^ 1=S }( e ) « Pr Xl N^"ei=e") 
Considering the transformation from | e ^_ e | to = | e ^ =e | — e, we obtain 
Pt x (s\H,"ei=e") = Pr@(s-e|W,"e 1 =e") = vr(s-e) 
(in the second step we have taken into account Eq. |2J so that the likelihood function is 

L ei { Xl=s }(e) oc Tr(s-e) 
Now we can derive the posterior pdf of E\, up to a normalisation factor: 

Pr Sl (e|Bi,W,*a;i=s") oc L ei { Xl=s }(e) /i(e) oc ir(s-e) fi{e) . (7) 

As expected, Eq. [7] is equivalent with Eq. |6] so that the "instrument maker's" version of Bayesian 
updating is checked as accurate. The important point is, the prior pdf in this procedure, which is meant 

to apply as long as no direct information about 6 is (yet) available, is just f\ (e^) P In other words, we 
have determined the pdf of E^ ^ x ^ =s y, and it turns out the same as the pre-measurement pdf of E\. 

We have shown that X\ does not specify any recognizable subset of the reference class that is indicated 
by the clause "disregarding the unknown value of the parameter", with regard to random variable E\ . 

2.2.5 Justification of the direct pivotal argument 

A corollary of the irrelevance of x\ to the pdf of E\ is that x\ is also irrelevant to the pdf of D 
(denned in Sec. l2.lt . This is due to the identities 

d = X2~x\ = e2~ e\ , 

so that 

D = E 2 -Ei. 

We have already seen that the pdf of E\ is unaffected by our knowing x\, and of course so is the pdf of 
E 2 - Consequently, so is the pdf of D. In notation, 

Pr D (d\B h B 2 , a x 1 =s») = Pi D (d\B h B 2 ) (8) 

We have solved the problem stated in Sec. 12.11 regarding the post-measurement pdf of D. In this way 
we have provided the foundation for the direct pivotal argument, so that we can produce the pdf for "x 2 
given x\ , disregarding the unknown value of 9". 

5 If there is any doubt whether the pdf for the error is legitimate when we do not know of any pdf for 
9, let us refer to the symmetrical situation, when we admit a prior pdf for the parameter regardless of 
the properties of the measuring apparatus, even regardless of whether there will be any measurement at 
all. 
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To obtain a concrete result, note that the pdf of D is the marginal pdf of E 2 — E\ for any value of 9, 



Pr D (d\B 1 ,B 2 ) = JdUl(0f2(dH) 



(9) 



where £ is a dummy variable. Considering the transformation from D^ x ^ =g y to X 2 | x =s | , we conclude 
that 

Pr X2 (x 2 \B 1 ,B 2 ,'<x 1 =s») = JdUl(Of2(x2S+0- (10) 

This result is based on disregarding the true value of the parameter. By coincidence, fiducial predic- 
tion also results to the same pdf, which also coincides with the predictive pdf based on a uniform prior 
density for the parameter. 



3 Discussion 

Although the "direct" pivotal argument applies only with location measurements (a special case, even 
if not too uncommon) the importance of this analysis lies in showing an example of non-parametric pre- 
dictive inference based on parametric models. 

In another paper we shall extend this result in two ways: the predicted outcome need not be related 
to a location measurement, and the prediction may be based on any number of location measurements. 
However modest those developments appear in relation to the general case, the issues raised by them 
require careful treatment, so that they cannot be addressed in a short paper like the present one. 

Here is a note regarding the distinction between direct and fiducial prediction. Fisher has not over- 
looked that the problem of fiducial predictive inference based on datum x\ can be solved "directly", that 
is, not only "after the [...] distribution of the population parameter[...] has been obtained" (1 2 1, Sec. II); 
in other words: "without discussing the possible values of the parameter 9" (|3|, Sec. V.3). Yet he defines 
fiducial prediction as derived from fiducial probability of the parameter values; consequently the simpli- 
fication he mentions is only a secondary issue. In the present work predictive inference is defined in the 
absence of any distribution for 9, therefore the possibility to also calculate it as if from some intuitive 
density function of the parameter is fortuitous, proved in the case of location measurements but not yet 
guaranteed to be generally true. 



4 Conclusions 

The error (or "noise") of a location measurement corresponds to a pivotal random variable. There is 
an issue regarding what is the appropriate reference class for interpreting this random variable after the 
outcome is known. We show that, if we want a non-trivial and a practically useful result, the reference 
class must be specified by the clause "disregarding the unknown value of the parameter". In this way we 
also preserve correspondence with the common usage of the term 'probability'. This clause prevents the 
application of the fiducial argument, so that no pdf for the parameter may be justified; fiducial prediction 
is also voided by this clause. However, the direct pivotal argument remains valid. It solves the problem 
of predictive inference, for location measurements, without any intermediate parametric inferece. In this 
way we have attained "pure" predictive inference; that is, not involving any inductive component; every 
step involves deduction only. 
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